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The design of error-correcting codes used in 
modern communications relies on information 
theory to quantify the capacity of a noisy chan- 
nel to send information [lj. This capacity can be 
expressed using the mutual information between 
input and output for a single use of the channel: 
although correlations between subsequent input 
bits are used to correct errors, they cannot in- 
crease the capacity. For quantum channels, it has 
been an open question whether entangled input 
states can increase the capacity to send classi- 
cal information|2j. The additivity conjecture |3l |4] 
states that entanglement does not help, making 
practical computations of the capacity possible. 
While additivity is widely believed to be true, 
there is no proof. Here we show that additivity is 
false, by constructing a random counter-example. 
Our results show that the most basic question of 
classical capacity of a quantum channel remains 
open, with further work needed to determine in 
which other situations entanglement can boost 
capacity. 

In the classical setting, Shannon presented a formal 
definition of a noisy channel £ as a probabilistic map 
from input states to output states. In the quantum set- 
ting, the channel becomes a linear, completely positive, 
trace-preserving map from density matrices to density 
matrices, modeling noise in the system due to interac- 
tion with an environment. Such a channel can be used to 
send either quantum or classical information. In the first 
case , a dramatic violation of operational additivity was 
recently shown, in that there exist two channels, each of 
which has zero capacity to send quantum information no 
matter how many times it is used, but which can be used 
in tandem to send quantum information^ . 

Here we address the classical capacity of a quantum 
channel. To specify how information is encoded in the 
channel, we must pick a set of states pi which we use as 
input signals with with probabilities Pi . Then the Holevo 
formula[5] for the capacity is: 

X = H(£p i £(p i ))-^ Pi H(£(j>ij), (1) 

i i 

where H(p) — — Tr(/9ln(/?)) is the von Neumann entropy. 
The maximum capacity of a channel is the maximum over 
all input ensembles: 

Xmax(£) = max {pi}i{w} x(£, {Pi}, {Pi})- (2) 



Suppose we have two different channels, £\,£i. To com- 
pute this capacity, it seems necessary to consider entan- 
gled input states between the two channels. Similarly, 
when using the same channel multiple times, it may be 
useful to use input states which are entangled across mul- 
tiple uses of the same channel. The additivity conjecture 
(see Figure 1) is the conjecture that this does not help 
and that instead 

(£l <g> £ 2 ) = Xmax(£l) + Xmax (£a)< (3) 

The additivity conjecture makes it possible to com- 
pute the classical capacity of a quantum channel. Fur- 
ther, Shor U showed that several different additivity con- 
jectures in quantum information theory are all equiva- 
lent. These are the additivity conjecture for the Holevo 
capacity, the additivity conjecture for entanglement of 
formation [5J, strong superadditivity of entanglement of 
formation [TJ, and the additivity conjecture for minimum 
output entropy [3 . In this Letter, we show that all of 
these conjectures are false, by constructing a counterex- 
ample to the last of these conjectures. Given a channel 
£, define the minimum output entropy H min by 

H^(£) = mm w H(£(\i,)(m- (4) 

The minimum output entropy conjecture is that for all 
channels £\ and £2, we have 

ff min (£i ® £ 2 ) = !Z min (£i) + H min (£ 2 ). (5) 

A counterexample to this conjecture would be an entan- 
gled input state which has a lower output entropy, and 
hence is more resistant to noise, than any unentangled 
state (see Figure 2). 

Our counterexample to the additivity of minimum out- 
put entropy is based on a random construction, similar 
to those Winter and Hayden used to show violation of 
the maximal p-norm multiplicativity conjecture for all 
p > 1 [51 1§1 HO]. For p = 1, this violation would imply vi- 
olation of the minimum output entropy conjecture; how- 
ever, the counterexample found in [3] requires a matrix 
size which diverges as p — > 1 . We use different system and 
environment sizes (note that D << N in our construction 
below) and make a different analysis of the probability 
of different output entropies. Other violations are known 
for p close to 0|llj. 

We define a pair of channels £ and £ which are complex 
conjugates of each other. Each channel acts by randomly 
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choosing a unitary from a small set of unitaries Ui (i = 
1...D) and applying that to p. This models a situation in 
which the unitary evolution of the system is determined 
by an unknown state of the environment. We define 

D 

£{p) = YsPitfpUi, (6) 
»=i 

D 
i=l 

where the {/, are N-by-N unitary matrices, chosen at 
random from the Haar measure, and the probabilities P, 
are chosen randomly as described in the Supplemental 
Equations. The Pi are all roughly equal. We pick 

1 « D « N. (7) 

We show in the Supplemental Equations that 

Theorem 1. For sufficiently large D, for sufficiently 
large N , there is a non-zero probability that a random 
choice of Ui from the Haar measure and of Pi (as de- 
scribed in Supplemental Equations) will give a channel £ 
such that 

H min (£®£) < H mia (£)+H min (£) (8) 
= 2H min (£). 

The size of N depends on D. 

For any pure state input, the output entropy of £ is at 
most ln(Z?) and that of £ ®£ is at most 21n(D). To show 
theorem Q, we first exhibit an entangled state with a 
lower output entropy for the channel £®£. The entangled 
state we use is the maximally entangled state: 

N 

|*me> = (1/VN) \a) ® \a). (9) 

a = l 

As shown in Lemma 1 in the Supplemental Equations, 
the output entropy for this state is bounded by 

fl"(£®£(|*ME><tf M E|)) <21n(£>)-In(D)/D. (10) 

We then use the random properties of the channel to 
show that no product state input can obtain such a low 
output entropy. Lemmas 2-5 in the Supplemental Equa- 
tions show that, with non-zero probability, the entropy 
H min (£) is at least In(D) - 5S" max , for 

6S m ™ = c 1 /D+ Pl (D)0(^ln(N)/N)), (11) 

where c\ is a constant and Pi(D) = poly(£>). Thus, 
since for large enough D, for large enough N we have 
2iS max < \n(D)/D, the theorem follows. 



The output entropy can be understood differently: for 
a given pure state input, can we determine from the out- 
put which of the unitaries Uj was applied? Recall that 

C/^FVme) = |*me). (12) 

for any unitary U. This means that, for the maximally 
entangled state, if a unitary U\ was applied to one sub- 
system, and u\ was applied to the other subsystem, we 
cannot determine which unitary i was applied by looking 
at the output. This is the key idea behind Eq. (JToJ) . 

Note that the minimum output entropy of £ must be 
less than ln(D) by an amount at least of order 1/D. Sup- 
pose Ui and U2 are the two unitaries with the largest 
li. Choose a state \ip) which is an eigenvector of U\U\. 
For this state, we cannot distinguish between the states 
U\\tp) and U2\ip), and so 

H min (£) < ln(D) - (2/D) ln(2). (13) 

Our randomized analysis bounds how much further the 
output entropy of the channel £ can be lowered for a 
random choice of Ui. 

Our work raises the question of how strong a violation 
of additivity is possible. The relative violation we have 
found is numerically small, but it may be possible to in- 
crease this, and to find new situations in which entangled 
inputs can be used to increase channel capacity, or novel 
situations in which entanglement can be used to protect 
against decoherence in practical devices. The map £ is 
similar to that used. 12] to construct random quantum 
expanders [131 raising the possibility that determin- 
istic expander constructions can provide stronger viola- 
tions of additivity. 

While we have used two different channels, it is also 
possible to find a single channel £ such that H min (£ ® 
£) < 2H min (£), by choosing Ui from the orthogonal 
group. Alternately, we can add an extra classical input 
used to "switch" between £ and £, as suggested to us by 
P. Hay den. 

The equivalence of the different additivity conjecture jl] 
means that the violation of any one of the conjectures 
has profound impacts. The violation of additivity of the 
Holevo capacity means that the problem of channel ca- 
pacity remains open, since if a channel is used many 
times, we must do an intractable optimization over all 
entangled inputs to find the maximum capacity. How- 
ever, we conjecture that additivity holds for all channels 
of the form 

£=T®T. (14) 

Our intuition for this conjecture is that we believe that 
multi-party entanglement (between the inputs to three or 
more channels) is not useful, because it is very unlikely 
for all channels to apply the same unitary; note that the 
state $me has a low minimum output entropy precisely 
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FIG. 1: Communicating classical information over a quantum channel. A set of states pi are used with probabilities pi as 
signal states on the channel. In (a), we use input states which are unentangled between channels £ and £ . In (b), we allow 
entanglement. The capacity of £ is equal to £. The question addressed is whether entangling, as shown in (b), can increase 
this capacity. 




FIG. 2: Minimum output entropy of a quantum channel. A pure state is input to the channel. While the input is a pure 
state, the output may be a mixed state. We attempt to minimize the entropy of the output state over all pure input states. 
The question addressed is whether an entangled input state, as shown in (b), can have a lower output entropy for channel 
£ Cg> £ , than the sum of the minimum output entropies for the two channels. 



because it is left unchanged as in Eq. (12 1 if both chan- pacity for arbitrary channels. 



nels apply corresponding unitaries. This two-letter addi- 
tivity conjecture would allow us to restrict our attention 
to considering input states with a bipartite entanglement 
structure, possibly opening the way to computing the ca- 
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Supplemental Equations: 

To choose the Pi, we first choose a set of amplitudes k as follows. For i = 1, D pick h >0 independently from a 
probability distribution with 

P(k) cx l™- 1 exp(-iVD/?), (15) 

where the proportionality constant is chosen such that L°° P(li)dli = 1. This distribution is the same as that of the 
length of a random vector chosen from a Gaussian distribution in N complex dimensions. Then, define 



Then we set 



so that 



Pi = ll/L\ (17) 



£ (P) = il%P\pU u (18) 



i=l 



The only reason in what follows for not choosing all the probabilities equal to 1/D is that the choice we made will 
allow us to appeal to certain exact results on random bipartite states later. 
We also define the conjugate channel 

i=l j = l 

As shown in [T5"] 

ff min (£) = H min (£ c ). (20) 
In the C(...) notation that follows, we will take 

l«D«N. (21) 

We use "computer science" big-0 notation throughout, rather than "physics" big-0 notation. That is, if we state 
that a quantity is O(N), it means that it is asymptotically bounded by a constant times N, and may in fact be much 
smaller. For example, y/N is O(N) in computer science notation but not in physics notation. 

Theorem 1 follows from two lemmas below, [l]and[5] which give small corrections to the naive estimates of 21n(D) 
and ln(-D) for the entropies. Lemma [l] upper bounds H min {£ <g> £) by 21n(L>) - \n{D)/D. Lemma [H] shows that for 
given D, for sufficiently large N, with non-zero probability, the entropy H mm (£) is at least ln(D) — <55 max , for 



6S™ = a/D +p 1 {D)0{ s /]n(N)/N)), (22) 

where c\ is a constant and Pi(D) — poly(D). Thus, since for large enough D, for large enough N we have 2<55 max < 
1n(D)/D, the theorem follows. 

Lemma 1. For any D and N , we have 

H min {8®£) < Iln(£)) + ^ll n (£» 2 ) (23) 
= 21nCD)-iln( J D). 
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Proof. Consider the maximally entangled state, |^me) == 0-/VN) Yla=i \ a ) ® Then, 



£®£(|*me}(*me|) 



E 



L 4 



|*me)(*me| 



(24) 



E(H) (^®^)|*mb)(*me|( 



Since the states I^meX^meI and (Uj ® L^I^meX^meK^ ® t/j) are pure states, the entropy of the state in (24 1 is 
bounded by 



® f (i*me)<*me|)) < -(E |*) h E Jr) - E : 



l 2 l 2 

L 4 



l 2 l 2 



(25) 



To show Eq. (251, let p i3 = uj ® ^|*me)<*Me|C^ O t/j. Note that = p w - = |*me)(*me| for all z, j. Then, the 
entropy is equal to 

If 



H(£®£(\* ME ){* ME \)) = -Tr[(^-^)|*ME>(*ME|ln(f ® f (|* me )(*me| 



(26) 



' E^iif^- ln ( £ ® ^(i*me)(*me|) 



Using the fact that the logarithm is an operator monotone function|16] . we find that ln^£ ® £ (I^me) (^meI)) _! 

ln (E I rrl*ME)(*M E |), and also that In I £ ® £ (|*me)(*me|) > ln( -jj-Pij) for all j, j. Inserting these inequalities 
into Eq. ( 26 1 , we arrive at Eq. ( 25 1 . 



We claim that the right-hand side of Eq. (|25[) is bounded by 



(E^)M£i)-£ 



l 2 l 2 

L 4 



ln( 



ft 2 

•vj 

L 4 



< 



D 



ln(D) 



D - 1 



ln(D 2 ). 



(27) 



To show Eq. (27 1, define P same = £^/£ 4 . We claim that P SQme > 1/D. To see this, consider the real vectors 
(l 2 /L 2 , I'jj/L 1 ) and (1, 1). The inner product of these vectors is equal to 1 since J^i li/L 2 = 1 while the norms 
of the vectors are V P S ame and \[T), respectively. Applying the Cauchy-Schwarz inequality to this inner product, we 



find that P sam e > as claimed. Then the left-hand side of Eq. (27) is equal to 



L 4 ' 



L 4 



ln( 



l 2 l 2 



L 4 



Psame ^(^same) (1 Psame) Psame) 



(28) 



"(1 Psame) ( q 



z 2 / 2 



(1 Psame)L J \( 



hi 



(« 



Z 2 Z 2 



P,. 



2 )L 4 



Psame ^(^same) (1 Psame) Psame) 



-(1 -Psame) HD 2 ~ D )- 



The last line of Eq. ( 28 1 is maximized at P s , 



l/D, giving Eq. (27 1, which implies Eq. (23 I. 



□ 



Lemma 2. Consider a random bipartite pure state \tp)(ip\ on a bipartite system with subsystems B and E with 
dimensions N and D respectively. Let Pe be the reduced density matrix on E. Then, the probability density that pe 
has a given set of eigenvalues, pi, ...,po, is bounded by 



D 



D 



P(p 1 ,...,p D )Y[dp l 

i 

0{Nf^D^- D ^8{l-Y,Pi)J{pf- D ^i- 

i=l i=l 

0(Nf^S(l-J2Pi)T[F(Pi)<lPi, 



(29) 
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where we define 

F(p) = D N-D p N~D exp [_ (iV _ D ) D (p _ (30) 

Note that F(p) < 1 for all < p < 1. 

Similarly, consider a random state pure state p — \x)(x\ on an N dimensional space, and a channel £ c (...) as 
defined in Eq. (Toy, with unitaries Ui chosen randomly from the Haar measure and the numbers U chosen as described 



in Eq. (15) and with N > D. Then, the probability density that the eigenvalues of £ (p) assume given values px, ...,Pd 



is bounded by the same function P(pi, ...,pd) Yli^Pi as above. 

Proof. As shown in[T7J[TS], the exact probability distribution of eigenvalues is 

D D 

P{pi,...,p D )l[dp i( xS(l-J2Pi) II {Pi-PkfJ{pi~ D ^h (31) 

i i=l l<j<k<D i=l 

where the constant of proportionality is given by the requirement that the probability distribution integrate to unity. 
The proportionality constant is 0{N)°^ D )D^ N ^ D ) D as we show below, and for < pi < 1 

n (Pi-p k ff[p?- D <f[p?- D , (32) 

\<j<k<D i=l i=l 



so Eq. (29) follows. The second equality in (29 1 holds because J2i(Pi — 1/D) =0. 

Given a random pure state |x)(x|, with Ui and k chosen as described above, then the state £ C (\x)(x\) has the 
same eigenvalue distribution as the reduced density matrix of a random bipartite state, so the second result follows. 
To see that the eigenvalue distribution of a random bipartite state in DN dimensions is indeed the same as that 
of f C '(|x)(xl)5 we consider the reduced density matrix on the N dimensional system of the random bipartite state 
and show that it has the same statistical properties as £(|x)(x|)- We choose the DN different amplitudes of the 
unnormalized bipartite state from a Gaussian distribution. Equivalently, for each i = 1, ...,D corresponding to a 
given state in the environment, we choose an N dimensional vector \vi) from a Gaussian distribution. Thus, before 
normalization, the reduced density matrix of the random bipartite state on the N dimensional system has the same 
statistics as the sum Y^iLi \ v i)( v i\ where the \vi) are states drawn from a Gaussian distribution. The state £(\x)(x\) 
is the sum ^2 i= i{l 2 / L 2 )u}\x){x\Ui. The If have the same statistics as |v;| 2 , while the directions of the vectors Uj\x) 
are independent and uniformly distributed, as are the directions of the \vi). The factor of L 2 takes into account the 
normalization, so that £(\x)(x\) indeed has the same statistics as the normalized bipartite state as claimed. 

Finally, we show how to upper bound the proportionality constant. One approach is to keep track of constant 
factors of N in the derivation of |17l 118) . Another approach, which we explain here, is to lower bound the integral 

/ <^(1 — J2iLiPi) Yii<j<k<D(Pj ~ Pk) 2 TliLxPf ^dpi- As a lower bound on the integral, we restrict to a subregion of 
the integration domain: we assume that the i-th eigenvalue pi falls into a narrow interval of width l/N, and we choose 
these intervals such that \pi — pj\ > l/N for i ^ j and such that \pi — 1/D\ < 0(D)/N. To do this, for example, 
we can require that the i-th eigenvalue pi obey 1/D + (2i - D - 3/2) /N < pi < 1/D + (2i - D - 1/2) /N. Then, 
in this subregion, Y[ l<j<k<D (pj - Pk) 2 > (l/N)° 2 , and Y[f= x pf ~ D >(1/D- 0(D/ N))( N ~ D ) D . The centers of the 
intervals were chosen such that if each eigenvalue is at the center, then X^P* = 1; we can then estimate the volume 
of the subregion as « \fl~j '_D(l/iV)- D_1 . Combining these estimates, we lower bound the integral as desired. □ 

Remark: In order to get some understanding of the probability of having a given fluctuation in the entropy, we 
consider a Taylor expansion about pi = 1/D. The next three paragraphs are not intended to be rigourous and are 
not used in the later proof. Instead, they are intended to, first, give some rough idea of the probability of a given 
fluctuation in the entropy, and, second, explain why e-nets do not suffice to give sufficiently tight bounds on the 
probability of having a given fluctuation in the entropy and hence why we turn to a slightly more complicated way of 
estimating this probability in lemmas 3-5. 

I f al l the probabilities pi are close to 1/D, so that pi = 1/D + Spi for small Spi, we can Taylor expand the last line 



of (|29j, using pf -13 = exp[(iV - D) ln( Pi )], to get: 

P(pi, ...,pn) w O(N) ^ exp[-(iV - D)D 2 V Sp 2 /2 + ...}. (33) 
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Similarly, we can expand 

S = - ]n( Pi ) « ln(D) - D £ Spj/2 + ... (34) 



Using Eq. (|33|34|, we find that the probability of having S = ln(D) - SS is roughly 0(N) 0( - D ^ exp[-(7V - D)DSS] 



Using e-nets, these estimates ( 33|34 1 give some motivation for the construction we are using, but just fail to give a 



good enough bound on their own: define an e-net with distance d << 1 between points on the net. There are then 
0(d~ 2N ) points in the net. Then, the probability that, for a random Ui, U, at least one point on the net has a given 
SS is bounded by « exp[— NDSS + 2iVln(l/d)]. Thus, the probability of having a SS — \n(D)/2D is less than one for 
d > D -1 / 4 . However, in order to use e-nets to show that it is unlikely to have any state \tp) with given SS, we need to 
take a sufficiently dense e-net. If there exists a state with given SS°, then any state within distance d will have, 
by Fannes inequality [TU], a SS > SS° — d 2 \n(D/d 2 ), and therefore we will need to take a d of roughly l/\fD in order 
to use the bounds on SS for points on the net to get bounds on SS° with an accuracy 0(1/ D). 

However, in fact this Fannes inequality estimate is usually an overestimate of the change in entropy. Given a 
state with a large SS°, random nearby states x can be written as a linear combination of |?A ) with a random 
orthogonal vector \<f>). Since £ c (\<j>) (<f>\) will typically by close to a maximally mixed state for random \<p), and typically 
will also have almost vanishing trace with £ c (\ip°)(i() \), the state £ c (\x)(x\) wm typically be close to a mixture of 
£ c (\ip°)(ip°\) with the maximally mixed state, and hence will also have a relatively large SS. This idea motivates 
what follows. 

Definitions: We will say that a density matrix p is "close to maximally mixed" if the eigenvalues p t of p all obey 

\Pi ~l/D\< c MM y/]n(N)/(N-D), (35) 

where the constant cmm will be chosen later. For any given channel £ c , let Pgc denote the probability that, for a 
randomly chosen \x), the density matrix £ c (\x)(x\) is close to maximally mixed. Let Q denote the probability that 
a random choice of Ui from the Haar measure and a random choice of numbers U produces a channel £ c such that 
P £ c is less than 1/2. Note: we are defining Q to be the probability of a probability here. Then, 

Lemma 3. For an appropriate choice of cmm, the probability Q can be made arbitrarily close to zero for all sufficiently 
large D and N/D. 

Proof. The probability Q is less than or equal to 2 times the probability tha t fo r a random Ui, random and random 
|x), the density matrix £ c (\x)(x\) is n °t close to maximally mixed. From ( 29 1 , and as we will explain further in the 



next paragraph, this probability is bounded by the maximum over p such that \p — 1/D\ > cm m \/lo-(N) / (N — D) of 

0(N 2 )°^F( P ) (36) 
= 0(N 2 )°^D N - D p N - D cxp[-(N - D)D(p - 1/D)} 
« exp[0(D 2 )ln(N) - (N - D)D 2 c 2 MM (\n(N) / (N - D))/2 + ...]. 

By picking cmm large enough, we can make this probability 

^| P - 1/D |> Ci;M VM«( ( JV2 ) 0(D2)f W) W 

arbitrarily small for sufficiently large D and N/D. 

The fact that F(p) < 1 for all < p < 1 is important in the claim that (37 1 indeed is a bound on the given 
probability. To compute the probability density for a given set of eigenvalues, pi, such that for some j we have 
\pj -1/D\ > cmmVM N )/( N ~ D ), we can use the bound F(p) < 1 to show that 0(N 2 )°^]\ I ^ 1 F(p i )Ap i is 
bounded by 0(N 2 ) 0{D ^ F(p J )]\f =1 dp i . Therefore, Eq. (37) gives a bound on the probability density under the 



assumption that for some j we have \pj — 1/D\ > cmm\/^(N)/(N — D). 

To turns this bound on the probability density into a bound on the probability, note that the total integration 
volume J S(l — J2iLi Pi) UiLi ^Pi is bounded by unity, and the set of pi such that for some j we have \pj — 1/D\ > 
cmm\/^(N)/(N — D) is a subset of the set of all pi. 



Finally, note that the maximum of Eq. (37) is achieved at \p — 1/D\ = cm m \/ m (^V) / (N — D) and it is straightfor- 



ward to control the higher terms in the Taylor expansion of p6l in that case. □ 



The next lemma is the crucial step. 
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Lemma 4. Consider a given choice of Ui and li which give a £ c such that P £ c > 1/2. Suppose there exists a state 
such that £ (\ip )(ip \) has given eigenvalues pi,...,p£>. Let P near denote the probability that, for a randomly 
chosen state \\), the density matrix £ C (\x){x\) has eigenvalues qx,...,qn which obey 



for some y > 1/2. Then, 



h - (VPi + (1 - V)(l/D))\ < P oly(£»)0(Vln(iV)/(iV - £>)) 

Pnear > exp(-0(JV))(l/2 - l/ P oly(D)), 



(38) 



(39) 



where the power of D in the polynomial in (39) can be made arbitrarily large by an appropriate choice of the polynomial 
in 



Proof. Consider a random state \. We can write |x) as a linear combination of and a state \cf>) which is orthogonal 
to ijj ) as follows: 



|X> = zy/l-x 2 \^ Q )+x[ 



(40) 



where z is a phase: \z\ = 1. 

For random \, the probability that x 2 < 1/2 is exp(— O(N)). We can also calculate this probability exactly. Let 
S„ be the surface area of a unit hypersphere in n dimensions. Then, the probability that x 2 < 1/2 is equal to 



c-i 

°2N 



tt/4 



27rcos(6») sin(6>r iv ~ d <SW- 2 d(9 

2JV-2 



(41) 



= (1/^2) 
= exp[-ln(2)(AT- 1)] 
= exp(-C(iV)). 

Since x ' IS random, the probability distribution of \<f) is that of a random state with (<fi\ip°) — 0. One way to 
generate such a random state \(f>) with this property is to choose a random state \9) and set 



(l-\^){^\)\6). 



(42) 



If we choose a random state \9), then with probability at least 1/2, the state £ c (\8)(9\) is close to maximally mixed. 
Further, for any given i,j, the probability that |(V> o |E/iL/J|0)| is greater than 0{^/\n{D)/N) is l/poly(L>), and the 
polynomial poly(_D) can be chosen to be any given power of D by appropriate choice of the constant hidden in the O 
notation for 0{y/\n(D)/N). Therefore, 



Pr 



Tr |£ C (|0}<V°|)| > poly(£>)Vln(£>)/JV < l/poly(D), 



(43) 



with any desired power of D in the polynomial on the right-hand side (the notation Tr(|...|) is used to denote the 
trace norm here). 
Then, since 



Pr 



- \6) > 0(y/HD)/N)] < l/poly^), 



(44) 



we find that 



Pr 



Tr 



(|£ C (|0)(V°|)|) > polyp) v/lnp)/7v" 



< l/poly(£>), 



(45) 



with again any desired power in the polynomial. 

The probability that £ c {\6){9\) is close to maximally mixed is at least 1/2, and so by ( 19|44 ) the probability that 
the eigenvalues ri,...,r D of £ c {\4>) (<t>\) obey 



\n-l/D\ < c M mVHN)/(N -D) + poly(D)(\n(D)/N) 
< poly(D)0(^HN)/N) 



(46) 



9 



is at least 1/2 — l/poly(D). Let 

y^l-x 2 . (47) 

Thus, since 

£ C (\X)(X\) = (l-x 2 )£ c (|V°)(^ o |)+x 2 £ c (|0)(0|) (48) 



^{zx^l~x 2 £ c {\^){^\) + h.c)j. i 



using Eq. (45) we find that for given x, the probability that a randomly chosen \(f>) gives a state with eigenvalues 



qi, qu such that 

\ Qi - (y Pl + (1 - < poly(£>)0(Vln(JV)/iV) (49) 

is 1/2 — l/poly(D). Combining this result with the exp(—0(N)) probability of x 2 < 1/2, the claim of the lemma 
follows. □ 

We now give the last lemma which shows a lower bound, with non-zero probability, on H mm (£ c ). The basic idea 
of the proof is to estimate the probability that a random state input into a random channel £ gives an output state 
with moderately low output entropy (defined slightly differently below in terms of properties of the eigenvalues of the 
output density matrix). We estimate this probability in two different ways. First, we estimate the probability of such 
an output state conditioned on £ c being chosen such that there exists some input state with an output entropy less 
than ln(D) — 5S max . Next, we estimate the probability of such an output state, without any conditioning on £ c . By 
comparing these estimates, we are able to bound the probability of £ having an input state which gives an output 
entropy less than ln(D) — SS" 



Lemma 5. If the unitary matrices Ui are chosen at random from the Haar measure, and the U are chosen randomly 
as described above, then the probability that H mm {£ ) is less than ln(_D) — SS ma,vi is less than one for sufficiently large 
N, for appropriate choice of c± and p±. The N required depends on D. 

Proof. Let P bad denote the probability that H mm (£ c ) < ln(D) - <5S max . Then, with probability at least P bad - Q, 
for random U t and l h the channel £ c has P £ c > 1/2 and has H mm (£ c ) < ln(L>) - 5S max . 

Let be a state which minimizes the output entropy of channel £ G . By lemma [4j for such a channel, for a 
random state |x), the density matrix £ (|x)(xl) nas eigenvalues q\, ...,qr) which obey 

\ qi - (y Pl + (1 - y)(l/D))\ < poly (D)0(^/HN)/N) (50) 

for y > 1/2 with probability at least 

exp(-0(AO)(l/2 - l/poly(U)). (51) 



Therefore, for a random choice of Ui,l i: X: the state £ C (\x)(x\) has eigenvalues ^ which obey Eq. (50 1 with 
probability at least 

(Pbad ~ 0) exp(-0(JV))(l/2 - 1/polyOD)). (52) 



However, by Eq. (29), the probability of havi ng such eigenvalues qi is bounded by the maximum of the probability 



density P(q\, qo) over which obey Eq. (501. Given the assumptions that — J^. Pi ln( Pi ) < ln(D) — <5S' lnax , y > 1/2, 
and the constraint that Y^iPi = 1j the quantity P(qi, qjj) < 0(N)°^ D ) exp[— C2(N — D)], where C2 can be made 
arbitrarily large by choosing ci large (the proof of this statement is given in the next paragraph) . We pick cm m so 
that Q < 1 and then if Pbad = L we can pick c\ and p\ such that for sufficiently large N this quantity P(q\, qo) 



is less than that in (52 1, giving a contradiction. Comparing to the discussion below Eq. (40), we sec that we need 
C2 > ln(2) to get this contradiction. Therefore, Pbad < L In fact, since Q can be made arbitrarily close to zero, Pbad 
can be made arbitrarily close to zero for sufficiently large D,N. 

Finally, we briefly show how C2 can be made arbitrarily large by choosing c\ sufficiently large. The natural way 
to do this is by treating this problem as a constrained maximization problem: maximize the probability P(qi, qo) 
subject to a contraint on the entropy of the P i. This maximization can be done with Lagrange multipliers, and the 
final result is obtained after a direct, but slightly lengthy, calculation. We now show a slightly different way to obtain 
the same result. First, we claim that we can find constants x, y with < x < 1 < y, such that the probability that an 
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eigenvalue q t falls outside the interval (x/D, y/D) is bounded by 0(N)°^ D ) exp[— c 2 (N — D)\ for any desired c-i- To 
show this claim, we use the fact that this probability is bounded by 0(N)°( D >max(F(x/D), F{y/D)). The function 
F(x/D) = exp[-(N-D)(x-l) + (N-D)ln(x)] = exp[(N - D){-x + 1 + hi(»)]. We choose x sufficiently small that 
—x + 1 + ln(x) < C2 and similarly we choose y sufficiently large that — y + 1 + ln(y) < C2, and then we have bounded 
the probability of any eigenvalue qi lying outside this interval. Thus, we can assume that the eigenvalues lie inside 
this interval. 

Next, for any set of eigenvalues {qi} which all lie in this interval, we have 



J] HUi) < exp[-(AT - D)D 2 £(<» - l/D) 2 /2y 2 ]. 



(53) 



Comparing Eq. (53l to Eq. (33), we have worsened by a constant in the exponent (l/2y 2 instead of 1/2), but the 
inequality is now valid for all qi in the interval (x/D, y/D), not just as a Taylor expansion. We now also give a bound 
on the entropy. For any set of eigenvalues of the density matrix p t , we have 



s ({Pi}) = ~z2PiMPi) 

i 

= \n(D) - 5S 

> Mp)-DYfri-VD? 



(54) 



To derive Eq. (54} , note that E^-ftln^) = ln(£>) + £\[(1/Z>) ln(l/Z>) - PiHPi) + (Pi ~ ^-l D ){H l l D ) + 1)]. 
because J2 t ( Pl - 1/D) = 0. Then, SS = - £J(1/-D) ln(l/£>) - Pi Mr) + ( K - l/£>)(ln(l/£>) + 1)]. For Pi = l/D, 
-[(1/D)1n(l/D) - Pi Info) + (p { - 1/D) (ln(l/£>) + 1)] = 0, while for = 0, it is equal to 1/D. The function 
D(pi — 1/D) 2 is a quadratic function chosen to fit these two points (0 at pi — 1/D and 1/D at 0), and both 
D(pi - 1 /D) 2 and (1/D) ln(l /D) - p { ki( Pi ) + (p { - 1 /D) Qn(l/D) + 1) have vanishing derivative at Pi = 1 / D; it was 
to make the derivative vanish that we subtracted off that linear term. By checking the sign of the third derivative of 



~Pi\n(pi) one may verify the inequality (54 1 



Comparing Eq. (54 1 to Eq. (34 1, we have lost the factor of 1/2 in (54), but the result is now an inequality valid for 
all pi, not just a Taylor expansion. Comparing Eq. (|53[) and Eq. (154) 



and using Eq. ( 50 ) , we find 



Y[F( qi ) <exp{- 



(N - D)D[5S 



■poly(D)Q{y/HN)/N)] \ 
8v 2 /' 



(55) 



and so we can make C2 arbitrarily large by choosing sufficiently large c\ . 



□ 



This completes the proof of the theorem. 
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